Fault Tolerant Scheduling in Distributed Networks

نویسندگان

  • Jon B. Weissman
  • David Womack
چکیده

We present a model for application-level fault tolerance for parallel applications. The objective is to achieve high reliability with minimal impact on the application. Our approach is based on a full replication of all parallel application components in a distributed wide-area environment in which each replica is independently scheduled in a different site. A system architecture for coordinating the replicas is described. The fault tolerance mechanism is being added to a wide-area scheduler prototype in the Legion parallel processing system. A performance evaluation of the fault tolerant scheduler and a comparison to the traditional means of fault tolerance, checkpoint-recovery, is planned.1

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Real-time Fault-tolerant Scheduling Algorithm for Distributed Computing Systems

This article proposes a Distributed Realtime Fault-tolerant model, priority Real-time Fault-tolerant algorithm and computational architecture of Distributed Real-time Fault-tolerant. According to this model, the problem of how to schedule a weighted Directed Acyclic Graph (DAG) in Distributed computing system for high reliability can be solved in the presence of multiprocessors faults. When som...

متن کامل

Real-time Fault-tolerant Scheduling in Heterogeneous Distributed Systems

∗ This work was supported by National Defense Pre-research Foundation of China. Abstract: Some works have been done in addressing real-time fault-tolerant scheduling algorithms. However, they all based on homogeneous distributed systems or multiprocessor systems, which have identical processors. This paper presents two fault-tolerant scheduling algorithms, RTFTNO and RTFTRC, for periodic real-t...

متن کامل

A New Proactive Fault Tolerant Approach for Scheduling in Computational Grid

Grid Computing provides non-trivial services to users and aggregates the power of widely distributed resources. Computational grids solve large scale scientific problems using distributed heterogeneous resources. The Grid Scheduler must select proper resources for executing the tasks with less response time and without missing the deadline. There are various reasons such as network failure, ove...

متن کامل

An Efficient Fault Tolerant Scheduling Approach for Computational Grid

Grid computing serves as an important technology to facilitate distributed computation computational grids solve large scale scientific problems using heterogeneous geographically distributed resources. Problems like dispatching and scheduling of tasks are considered as major issues in computational grid environment. The Grid Scheduler must select proper resources for executing the tasks with l...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996